Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Big data active learning based on MapReduce
ZHAI Junhai, ZHANG Sufang, WANG Cong, SHEN Chu, LIU Xiaomeng
Journal of Computer Applications    2018, 38 (10): 2759-2763.   DOI: 10.11772/j.issn.1001-9081.2018041141
Abstract525)      PDF (751KB)(446)       Save
Considering the problem that traditional active learning algorithms can only handle small and medium size data sets, a big data active learning algorithm based on MapReduce was proposed. Firstly, a classifier was trained by Extreme Learning Machine (ELM) algorithm on an initial training set, and the outputs of the classifier were transformed into a posterior probability distribution by softmax function. Secondly, the big data set without labels was partitioned into l subsets, which were deployed to a cloud computing platform with l nodes. On each node, the information entropies of instances of each subset were calculated by the trained classifier, and q instances with maximum information entropies were selected for labeling, then the l× q labeled instances were added into the training set. Repeat the above steps until the predefined termination criterion was satisfied. Contrast test with ELM-based active learning algorithm were conducted on 4 data sets including Artificial, Skin, Statlog and Poker. Experimental results show that the proposed algorithm can complete active instance selection on 4 data sets, while the active learning algorithm based on ELM can only complete active instance selection on the smallest data set, indicating that the proposed algorithm outperforms the active learning algorithm based on ELM.
Reference | Related Articles | Metrics